This database contains wordlists collected as part of the Daghestanian loans project by the Linguistic Convergence Laboratory at NRU HSE. The aim of the 160-item shortlist, which is based on the World Loanword Database questionnaire, is to measure lexical contact on a micro-level. In other words, to quantify lexical convergence among the speech communities of minority languages on a village-level, and to detect fine-grained areal patterns beyond general observations on the spheres of influence of certain languages.

Contents:

##               [,1]
## target_words 25633
## languages       23

The database

For now, the table shows source Concepts and target Words. Each target word is grouped in a similarity Set - a set of words that have the same meaning and look similar. In the future, data will be added on borrowing sources. Metadata includes the name of the Village where the word was recorded, the administrative District it is part of, the Language spoken there, and the List ID: these ID’s correspond to a particular speaker or in some cases a written source like a dictionary. Data is accessible at: Github/LingConLab/DagloanDatabase.
The dataset in the dummy format is available here.


Version: 2019-03-29. For questions or comments contact jh.verhees@gmail.com.


Map of the surveyed villages

Hover over and / or click on a dot on the map to know more. The color of the dots corresponds to the number of lists collected in a village. Orange = dictionary data.

Sample lexical map

The map below shows the distribution of different stems for the concept ‘bucket’. In most cases the Russian word vedro is borrowed.